Biostatistics For Dummies (Monika Wahi John Pezzullo)

Forward stepwise: This is where one confounder covariate at a time is added to the model in

iterative models. If it does not meet rules to be kept in the model, it is removed and never

considered again in the model. Imagine you were fitting a regression model with one exposure

covariate and eight candidate confounders. Suppose that you add the first covariate with the

exposure and it meets modeling rules, so you keep it. But when you add the second covariate, it

does not meet the rules, so you leave it out. You keep doing this until you run out of variables.

Although forward stepwise can work if you have very few variables, most analysts do not use this

approach because it has been shown to be sensitive to the order you choose in which to enter

variables.

Backward elimination: In this approach, the first model you run contains all your potential

covariates, including all the confounders and the exposure. Using modeling rules, each time you

run the model, you remove or eliminate the confounder contributing the least to the model. You

decide which one that is based on modeling rules you set (such as which confounder has the largest

p value). Theoretically, after you pare away the confounders that do not meet the rules, you will

have a final model. In practice, this process can run into problems if you have collinear covariates

(see Chapters 17 and 18 for a discussions of collinearity). Your first model — filled with all your

potential covariates — may error out for this reason, and not converge. Also, it is not clear

whether once you eliminate a covariate you should try it again in the model. This approach often

sounds better on paper than it works in practice.

Stepwise selection: This approach combines the best of forward stepwise and backward

elimination. Starting with the same set of candidate covariates, you choose which covariate to

introduce first into a model with the exposure. If this covariate meets modeling rules, it is kept, and

if not, it is left out. This continues along as if you are doing forward stepwise — but then, there’s a

twist. After you are done trying each covariate and you have your forward stepwise model, you go

back and try to add back the covariates you left out one by one. Each time one seems to fit back in,

you keep it and consider it part of the working model. It is during this phase that collinearity

between covariates can become very apparent. After you try back the covariates you originally left

out and are satisfied that you were able to add back the ones that fit the modeling rules, you can

declare that you have arrived at the final model.

Once you produce your final model, check the p value for the covariate or covariates representing your

exposure. If they are not statistically significant, it means that your hypothesis was incorrect, and after

controlling for confounding, your exposure was not statistically significantly associated with the

outcome. However, if the p value is statistically significant, then you would move on to interpret the

results for your exposure covariates from your regression model. After controlling for confounding,

your exposure was statistically significantly associated with your outcome. Yay!

Use a spreadsheet to keep track of each model you run and a summary of the results. Save this

in addition to your computer code for running the models. It can help you communicate with

others about why certain covariates were retained and not retained in your final model.